Phonetic labeling and segmentation of mixed-lingual prosody databases

نویسندگان

  • Harald Romsdorfer
  • Beat Pfister
چکیده

An automatic system for segmenting speech signals used for the training of statistical prosody models is presented. Starting from a canonical transcription, the system simultaneously delivers an accurate phonetic segmentation and the matched phonetic transcription indicating pronunciation variants. Although the system is HMM-based, it uses only the speech signals of the prosody database which typically consists of a few hundred sentences with some 30 minutes total duration. Initial phone HMMs are generated with flat-start training using the canonical transcriptions of the sentences. Then iterative Viterbi search for best-matching pronunciation variants and HMM retraining is applied until convergence is attained.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Polyglot speech prosody control

Within a polyglot text-to-speech synthesis system, the generation of an adequate prosody for mixed-lingual texts, sentences, or even words, requires a polyglot prosody model that is able to seamlessly switch between languages and that applies the same voice for all languages. This paper presents the first polyglot prosody model that fulfills these requirements and that is constructed from indep...

متن کامل

Cross - Lingual Voice Conversion

CROSS-LINGUAL VOICE CONVERSION Cross-lingual voice conversion refers to the automatic transformation of a source speaker’s voice to a target speaker’s voice in a language that the target speaker can not speak. It involves a set of statistical analysis, pattern recognition, machine learning, and signal processing techniques. This study focuses on the problems related to cross-lingual voice conve...

متن کامل

Automatic analysis of prosody for multi - lingual speech corpora . Daniel Hirst

This chapter outlines a general approach and describes a set of tools for the automatic analysis of multilingual speech corpora. Two levels of representation can be derived automatically: a phonetic representation, which provides an extremely close copy of the original speech signal, and a surface phonological representation, which reduces the variability to a small number of discrete values wi...

متن کامل

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics

This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Crosslingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker’s natural speech uttered in his/her mother tongue. Although the technique holds promise t...

متن کامل

Automatic Labeling of Corpora for Speech

One of the bottlenecks in the development of text-to-speech synthesizers based on segment concatenation is the need for large, segmented and labeled corpora. Consequently, as manual segmentation and labeling is a tedious and time consuming task, there is a strong demand for automatic labeling systems which can label speech from many languages. Several systems have been proposed already, but the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005